ROCm et HIP : Un tutoriel détaillé en 10 chapitres : Le changement de mentalité pour la synchronisation du GPU

La transition fondamentale en informatique à haute performance consiste à passer d'un modèle d'exécution séquentielle centré sur le CPU à un modèle déconnecté producteur-consommateur où le CPU gère le flux tandis que le GPU fonctionne de manière indépendante. La compréhension essentielle est que le GPU n'est pas conçu pour être piloté comme un périphérique strictement synchrone; traiter le GPU de cette manière crée un goulot d'étranglement de type « arrêt et attente ».

1. Le cycle de vie du flux de travail

Dans une optique asynchrone, le développeur ne s'arrête pas à la fin de chaque tâche. Au contraire, il alloue la mémoire, démarre les noyaux, et copie les résultats en plaçant des requêtes non bloquantes dans une file matérielle.

2. Surmonter les blocages

Lorsque l'hôte est obligé de synchroniser après chaque opération, l'écart d'exécution — le temps de trajet entre le CPU et le GPU — domine les performances. En utilisant l'asynchronisme, le CPU continue à travailler pendant que le GPU traite son flux, maximisant ainsi la saturation du matériel.

$$\text{Temps total} = \max(\text{Travail CPU}, \text{Travail GPU}) + \text{Charge de synchronisation}$$

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which set of steps correctly converts a synchronous vector-add to use an explicit stream?

Call hipStreamCreate, use hipMemcpyAsync with the handle, and pass the handle as the 4th kernel argument.

Call hipDeviceSynchronize after every kernel launch and use hipMemcpy.

Set the stream parameter to NULL in all hipMemcpyAsync calls.

Replace hipMalloc with hipHostMalloc exclusively.

QUESTION 2

Why is a GPU considered 'not meant to be driven as a strictly synchronous device'?

Because it has no internal clock.

Because waiting for the CPU to confirm every command leaves thousands of cores idle.

Because memory transfers cannot be tracked by the CPU.

Because the GPU must manage its own power state.

QUESTION 3

What is the primary risk of forcing the host to synchronize after every operation?

Memory corruption.

Host-side stalling and loss of hardware saturation.

Increased power consumption on the GPU.

Kernel compile errors.

QUESTION 4

In the logistics warehouse analogy, what does the 'Conveyor Belt' represent?

A HIP Stream.

The GPU Driver.

The CPU Cache.

The VRAM buffer.

QUESTION 5

True or False: hipMemcpyAsync returns control to the CPU before the data transfer is complete.

True

False